perm filename PAPER[P,JRA]1 blob
sn#156932 filedate 1975-04-30 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00011 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 DRAFT-DRAFT-DRAFT
C00003 00003 INTRODUCTION-- A proposal for an interactive programming laboratory.
C00021 00004 STYLE--structure and abstraction
C00034 00005 INTERACTIVE PROGRAMMING--construction by selection
C00051 00006 FORMALISMS--specification languages and correctness
C00068 00007 TRANSCRIPTS--An approach to documentation
C00074 00008 RELATION TO OTHER WORK
C00093 00009 MILESTONES--Goals and Applications
C00100 00010 Bibliography--(partial)
C00104 00011 Biography:
C00113 ENDMK
C⊗;
DRAFT-DRAFT-DRAFT
A Proposal for an Interactive Programming System
by
John R. Allen
Research Associate
Computer Science Department
Stanford University
Stanford Cal 94305
(415)497-4971
April 29,1975
INTRODUCTION-- A proposal for an interactive programming laboratory.
"What is actually happening, I am afraid, is
that we all tell each other and ourselves that
software engineering techniques should be
improved considerably, because there is a
crisis. But there are a few boundary
conditions which apparently have to be
satisfied. I will list them for you:
1 We may not change our thinking habits.
2 We may not change our programming tools.
3 We may not change our hardware.
4 We may not change our tasks.
5 We may not change the organizational set-up
in which the work has to be done.
Now under these five immutable boundary
conditions, we have to try to improve matters.
This is utterly ridiculous."
--Dijkstra
For several years the programming community has been aware of the
rather poor quality of its product. Much of the current research
effort is devoted to the problem of reliability of programs.
Theoretical work on semantics and verification has given insight into
the nature of programming language constructs and has begun to
establish a firm foundation for program correctness. There is much
systems work dealing with programming environments --editors,
debuggers, and on-line aids. However it seems that little is being
done to affect the way people program. Verifiers will work on any
program, regardless of how poorly it was constructed; editors and
debuggers have no prejudice against sloppy programming. Yet one of
the most viable approaches to good programming style, that advocated
by Dijkstra, lobbies for such design. Indeed if we really expect
solutions to the problems of correctness or reliability we should
expect to develop a discipline in programming as rigorous as that
expected in mathematical reasoning. Even the most naive student of
logic is expected to understand his rules of inference and to combine
those schemas in an understandable and structured way. Yet no such
rigor is expected of programmers even though the difficulties in
constructing correct programs are as severe as those involved in
developing mathematical proofs. We should at least examine the
possibilities of imposing such a discipline on the programming
process.
We would propose two things. First, that we begin study of the
mechanics of a interactive programming laboratory which will help a
user informally construct programs in this style. This would be a
valuable experiment testing the validity of the claims for structured
programming and the design of the system would involve a reasonably
sophisticated piece of "human engineering". Second, we would propose
investigation of formal semantics specification which will aid in
correct construction of programs. For example, the verification
oriented rules of Hoare can be used as the basis of command language
for a interactive program constructor. There are other means of
specifying semantics which can be applied in this manner. Such
techniques should be investigated.
Specifically we are advocating the style of programming using
abstraction followed by elaboration (also known as step-wise
refinement, structured programming,...) as the preferred style. It
is this style of specification of algorithms which lends itself
naturally to correct construction. It is our belief that this style
of programming, reinforced with interactive facilities can make a
significant contribution to the construction of reliable software.
If such techniques are indeed viable then this approach should lead
to a more reasoned approach to automatic programming. For then, the
automatic construction of a program is a machine-applied sequence of
interactive programming commands. Our philosophy is that of
McCarthy: "In order for a program to be capable of learning something
it must first be capable of being told it." Thus we would understand
the programming process before attempting to construct an automatic
programmer. Experience tells us that the programming process is
poorly understood.
We believe that any attempt to influence current programming style
must be tested against accepted practice. Like it or not, we are
faced with an enormous backlog of poorly trained programmers and we
must be able demonstrate that our methodologies --regardless of how
valuable they are as theoretical tools-- have practical benefits.
We should be prepared to show that our techniques lead to more
reliable programs, more readable and more easily modifiable programs.
Thus the methodologies must either put a firm foundation on existing
practices, or if changes are advocated, must give convincing
demonstration of their worth. For these reasons we put emphasis on
informal techniques for constructing reliable and understandable
software.
" The comment was made by an experienced
programmer ..., that the much maligned
tendency to start coding too early may be a
symptom of wanting to write something formal
but lacking any alternative language in which
to state the partially thought-out ideas!"
When we begin thinking about a programming task, we visualize parts
of the algorithm, some of the details of the data structures, and
know something about what is expected for input and what is expected
for output. We should be able to begin specification of the program
with this amount of information. Since the final product of our
endeavors is to be an algorithm, it is reasonable to expect some
mechanical aid in the construction task. There are at least three
ideas expressed in the preceding remarks.
When a programmer says "I want a loop here", he is expressing a
desire for an iterative construct. The syntactic details will vary
from programming language to programming language, but the idea of
iteration is what is important. So, first, we are asking for a
simple specification language for describing the programming
structures. This language must be sufficiently powerful so as to
allow construction of "reasonably natural" programs. Certainly this
second criterion can be satisfied simply by providing one of the
currently available languages. However this seems unacceptable. All
current programming languages allow the construction of programs
which far outstrip our abilities to analyze their formal properties.
We can too easily construct programs which are so ill-structured or
opaque that analysis of their correctness is not practical. It seems
more to the point then to develop a powerful dialect of some existing
language, representing a subset which is amenable to proof
techniques. Then, as our understanding of verification increases and
our techniques for program construction improve, we can make
extensions of the original language. However the original dialect
should be chosen so that we can speak with some assurance about the
properties of programs contained in it.
Second, we would expect to begin expressing our algorithm to a
machine as we are constructing it. If we must wait until we have
completely described the algorithm on paper, say, then we are
ignoring a great deal of the power of the machine. We should expect
to be able to use the machine throughout all phases of program
construction, validation and optimization. We would wish to have a
programming environment in which a user can (1) describe an
algorithm, (2) construct a program, (3) experiment with and (4) debug
that program and (5) modify it if desired; (6) verify that the
program realizes the described algorithm, (7) experiment with
improvements to the program, and (8) compile and finally (9) execute
the correct program. One could further desire that (10) an automatic
program generator be available to take the initial algorithmic
description and present the user with a complete program, guaranteed
to be correct and efficient. However the thrust of this proposal is
the construction of well-structured programs through user
interaction. The "structuring" of programs derives from the process
of manipulating programs from a concise algorithm to an efficient
program, rather than from the presence or absence of constructs in a
particular language. In essence we look upon a computer as a
manipulator of structures rather than a manipulator of symbols or
numbers. One of the useful structures which we wish to manipulate is
a program.
The issue of useability is crucial to the success of this project.
We wish to present this system to working programmers as an
alternative to current on-line edit-debug-run systems. Such a user
must be comfortable with the interface which will be used to
manipulate programs of the programming language.
The applications of such a language system extend beyond simply a
programmer's aid. Such a programming environment should be an
invaluable aid to students learning about algorithmic processes and
computation. Construction of programs in the careful manner which
this lab supports, should reinforce the concepts of structured
programming. the addition to the basic lab of a Monitor-Helper
module could bring Computer Aided Instruction in programming
languages far beyond the current "programmed textbook" approach.
Given a natural specification language and a systematic means of
introducing parts of the algorithm to the machine, it is the third
idea which molds the proposal into an important tool. This third idea
is correctness. What will unify the specification of the algorithm
with the intentions of the programmer is an underlying logic of
program construction. Each programming construct --an iteration, a
conditional, a procedure call-- carries with its application certain
implications for correctness. We know that logical rules of
inference, properly applied, are guaranteed to maintain provability
in a formal system. In a similar manner research has begun in the
specification of programming constructs to relate intentions with
programs. Thus the programmer can continuously relate the
construction of his program to his expected intentions. When the
expectations are not realized, he can modify his construction or his
intentions. There is no need to wait until the program is (faultily)
completed.
To summarize then, the research plan will attack the problem of
constructing reliable programs through three related areas: education
and style in programming, interactive programming assistance, and
design of specification languages based on correctness considerations.
STYLE--structure and abstraction
We have separated out the discussion of programming style because we
feel it is a very important aspect of our investigation. Indeed an
appreciation of the importance of good programming style is a
necessary ingredient in any discussion of programming methodology.
It is the cultivation of good programming style coupled with
computer-aided programming tools which will make an impact of the
problems of reliability. The two elements of programming style which
we wish to emphasize informally and support in a programming
environment are: (1) the importance of abstraction and (2), the
importance of correctness proofs for algorithms. These two
ingredients form a basis for a pedagogy which can shed significant
light on the education of programmers. Current educational policies
have been woefully inadequate, doing little more than producing
sophisticated coders. The current educational approaches stem from a
bottom-up development of the field. That is, generation of
programming aids proceeded from octal coding to assembly language to
the first Fortran compiler. The difficulty is that people are still
trained in this historical sequence; thus people are trained to think
in terms of representation and the programming languages which we use
are designed to support this bottom-up way of thinking. That in
itself isn't bad as long as we realize that, but we must attempt to
develop techniques for talking at a higher level of abstraction.
The ideas of abstraction are closely related to the ideas of
"structured programming". But it is the ideas we are after as
espoused by Dijkstra and implicit in the papers of McCarthy; we are
not arguing syntax or the efficacy of goto's. Our emphasis is in the
"-ing" of "structured programming", for it is in the act of creation
that the structuring lies, not necessarily in the end product.
Indeed, in the machine's memory, all programs look alike. The
essence of the ideas is that we should program in "levels of
abstraction"; when we are satisfied with the correctness and adequacy
of a specific level of abstraction, we can begin elaboration of
subsidiary levels. These arguments are old, but still not well
understood by competent programmers; they should know better, but
every LISP programmer "puns" and uses "car-cdr-cons" throughout the
construction of even the highest level routines. Surely the
programmer is thinking about representing a data structure when he is
doing these operations, and just as surely he must have more mnemonic
names. Now the names are not as important as the fact that using
"car-cdr-cons" ties the program to a specific representation. It is
this latter condition which is unforgiveable.
What we are proposing is that the description of operations and data
structures on any level can best be done with a simple command
language, and the effect of the commands is to generate, modify --
generally manipulate-- pieces of algorithms and abstract data
structures in a specification language, rather than a specific
programming language. One may certainly say that such a
specification language is really a simple (and most likely
inadequate) programming language, since the output will be an
abstract algorithm. But the point is to develop a notation for
describing algorithms and to develop techniques for reasoning about
such programming abstractions. Just as we do not expect numerical
analysts to reason about their FORTRAN programs we should not expect
the data-structure programmer to reason about his LISP program.
The second ingredient we mentioned above --correctness-- is foreign
to many programmers. They simply do not realize that there is
anything to be proved. They have been trained to write and debug
programs. A program is deemed acceptable when it "looks right" and
excites no bugs. Thus even at the informal level there is much to be
gained by introducing them to the ideas of correctness. What our
programming environment will in fact do is give support to program
construction based on correctness considerations.
Just as the validity of a theorem in mathematics must be based on a
convincing proof, we should expect our programs to be subjected to
the same degree on conviction. Likewise in mathematics, when an
informal proof is suspect, the purveyor must be able to give a
convincing demonstration. Often this can be done in terms of formal
arguments. That it can be done is by no means a guarantee that is
will or should be done. Seldom do the pages of mathematical journals
contain formal proofs. In any case it is the informal reasoning
which predates the formal, and for most of us, it is the informal
"proof" which carries conviction. It is this degree of "proof" which
must currently be given priority in programming. We feel that the
programmer's degree of "conviction" about the correctness of a
program can best be raised by developing the algorithm in a simple
specification language which supports abstraction after he has been
convinced of the necessity of proofs.
Clearly, the area of "aesthetics of programs" is very shaky ground.
But there are some tests for adequacy: is the program easier to
read, debug, modify, or prove correct than another? It will only be
after a better understanding of the basis for correctness of programs
is begun at the informal level, that a rigorous formalism will be
successful. Thus education about the benefits of abstraction plus
the on-line tools to reenforce the programming style should lead to a
realistic programming methodology.
We should not attempt to support any programming style just because
it exists, neither should we attempt to legislate against a specific
methodology because it does not conform to our current stock of
correctness techniques. What we must do is to show that better
programmers and better programs result from this style. We must
learn to express ourselves better and develop programming style as
well as methodology.
One practical difficulty to better program design is that programming
aids still haven't progressed much beyond the concepts introduced by
Fortran. Certainly embellishments and improvements have been made in
programming languages, but the basic ideas are the same. The only
recognizable improvement is the development of sophisticated display
interaction, allowing programmers to edit and debug their programs at
a greater level of productivity. We believe there is a development
underway which should make a considerable improvement in
productivity. That is the development of interactive programming
systems. Most people nowadays understand what is mean by an
interactive program. What we are advocating is the investigation of
interactive programming: that is, computer-aided construction and
verification of programs. This is a comprehensive attack of the
problems of reliability which combines current technology in
display-based programs with current research into program
construction, verification and debugging. In a subsequent section we
will discuss more fully some of the recent research results.
So to summarize this discussion on programming style, we claim that
one immediate benefit of our research will be a concrete experiment
showing the efficacy of structuring techniques in the teaching of
programming. The success of such an experiment can be measured: Are
programmers who are exposed to such an education better than their
counterparts, trained in the usual methods? Are their programs
easier to read, modify, and debug? Can they generate their programs
faster?
INTERACTIVE PROGRAMMING--construction by selection
Unfortunately "interactive programming" means too many things.
Programming aids varying from clever spelling correctors to
sophisticated editing and debugging tools are all within the
terminology. We claim there is a more fundamental interpretation of
the phrase. It involves the construction of the program and consists
of two parts: a specification language in which a programmer can
concisely specify the algorithm; and a command language for
formulating, modifying, testing, and running partially specified
programs.
On-line editing and debugging aids are now quite common-place. We
should look carefully at the best of these and see what essential
features can be applied to program construction.
The best editing facilities derive from TVEDIT and its dialects. This
editor is display-based and allows the user to access text files in
pages like a book. The editing commands give the ability to insert
arbitrary text within a page; to shrink a page by deletion or to edit
existing lines. What makes these editors superior is that the
display window always reflects the state of the user's manuscript in
a natural format. When text is inserted the portion of the screen
below the insertion is moved down; when lines are deleted, the
succeeding text is moved up. When corrections are made the old text
is overwritten on the screen. Those who are familiar with such
editors would rather compose text on-line rather than write out the
ideas on paper and then repeatedly rewrite sections until an
acceptable product is arrived at.
There is the sophisticated display-based debugging program named
RAID. It allows the user to monitor the execution of his machine
language program displaying the contents of selected registers. As
the contents of these locations change, the user sees this effect. He
has commands at his disposal for stepping through his program;
examining locations; for halting execution in the case certain
conditions occur. Again people who have used such debugging
techniques would not consider looking at memory dumps. Lately several
researchers have begun to apply display-based techniques to the
debugging of programs written in higher-level languages. One of the
essential ingredient in these successful programs is that of
naturalness of expression. The user is able to think and react more
in his terms than in those convenient to the machine. The display
devices play a crucial role by giving the user a lot of pertinent
information all at once. We should begin thinking of analogous
systems for the professional programmer.
What is analogous to "naturalness" in the realm of program
construction? We believe it is closely related to what D. Swinehart
call "manipulative activities" as compared to "symbolic activities".
As an example of "manipulative activity" he notes that in playing a
musical instrument one does not think in terms of plucking strings or
pushing keys, but thinks in terms of producing notes or melodic
phrases. Swinehart initiates a study of some computational activities
with an eye toward producing commands simple and natural enough that
while doing them we need think only in terms of their effect. "In
this way they [the commands] tend to lose any symbolic meaning and
tend to become practically bodily extensions." In the domain we are
examining, that of program construction, we would expect to be able
to manipulate program constructs in a manner as close as possible to
the way in which we think. We think of loops, or recursion, or
assignment; we think structurally. To exploit our analogy with music
again: in playing the piano we need to recognize which key should be
selected to produce the desired note. Using a stringed instrument, we
have more responsibility; we have to locate the proper position on
the fingerboard; frets simplify the situation somewhat, but the
mechanics of producing music is still more complicated. In computer
science, the "compositions" we wish to write are programs. What we
are advocating here is that we do our composition by selecting the
"notes" --the programming constructs--, rather than being forced to
construct each "note" from symbols.
Current editors, like TVEDIT, are thus oriented toward "symbolic
activities". They work as well for document preparation as for
program preparation, even though the tasks are quite distinct. Few
of us still think in a text-oriented card image fashion when
describing programs. We think in terms of the structure of either
the program or the data. We we propose is to give critical
examination to the process of program construction with the aim of
producing a language for constructing algorithms.
For example when we think "Now I want to write a recursive algorithm,
say f[x;y] <= ..., recursive on y", that specifies a great deal of
the structure of f:
the body of f is a conditional expression;
the predicates in the conditional are recognizers, determined by
the type of y;
the corresponding expressions in the conditional are yet to be
determined computations.
But as we elaborate these computations, the information implied by
the branching guides us (or a machine) in maintaining low-level
consistency checks on further function calls. As we develop the
algorithm, we elaborate the function calls and elaborate, or pick
representations for, the abstract data structures. This makes an
excellent way to present data-structure programming in the class
room, giving a representation-independent presentation. One point of
this is that this approach is also an admirable way to teach
programming in general and the "monologue" which the instructor uses
to describe the construction can be turned into commands to a
"program constructor".
Thus we are interested in developing a command language for
construction and manipulation of programs. The output from such a
session is to be an algorithm, described in a specification language
whose components are rich enough that a very large class of
data-structure manipulating algorithms can be naturally described.
At the most concrete level we need a "structural editor" This
"editor" which is controlled by structural requests, rather than by
strings.
Fred Hansen demonstrated the feasibility of such an editing scheme in
his EMILY system. There he used a formalism like BNF to describe the
syntax rules of a language. The programmer, sitting at a display
console, was presented with a "menu" of syntax equations. He could
then select instances of these rules, building up a syntax tree. He
could thus replace any of the non-terminal nodes on this syntax tree
using elements in the menu. As a result of this systematic
construction, the program was always guaranteed to be syntactically
well-formed. For us, the essential ingredient of his scheme was
programming by "selection, not entry". That is, we select programming
constructs (by structure) rather than build the program as a string
of characters (symbolically). The mechanisms for selecting
construction schemas from the menu are thus related to Swinehart's
concept of "manipulative activity".
There are several difficulties involved in Hansen's system. His
system addressed itself too strictly to syntactic correctness. We see
the opportunity to extend this "selection, not entry" idea to aid
semantic correctness. It is currently feasible to associate a proof
construction with the program construction and thus give the
programmer added tools for construction of reliable programs.
Secondly, he was constructing programs in a "real" programming
language with all its syntactic complexity. Thus his user frequently
had to specify application of syntax equations which had no useful
"semantics". Relying in a very simple specification language should
allow algorithmic specification which is more "semantic".
However we gain much from the ideas presented by Hansen: the style of
program construction, and a significant effort in terms of the "human
engineering" to make such a system viable. The benefits of
"selection, not entry" which were observed by him will be available
to us as well. For example the difficulties of syntax--well-formed
programs, parsing and ambiguity are greatly alleviated since we will
be describing which rule we wish to apply. Likewise the number of
keystrokes necessary to specify a program will be significantly
lowered.
In terms of the "manipulative activities", such a system has much to
offer. The command language will allow the programmer to present his
algorithms to a machine in terms more natural to himself. The
transition from idea to working program should be much easier. Since
the user need think less about programming details he should be able
to maintain a better grasp of his problem and thus, at the informal
level, be better prepared to construct a correct program. The
additional benefits of having an underlying formalism to reinforce
the informal feeling of correctness with a formal proof are
exciting indeed. Thus there are commands to manipulate the semantic
components of the programming being constructed. Formalisms do
currently exist which can become an appropriate basis for such a
command language. We will describe one such in the next section.
However, programming language semantics is in a early developmental
stage and we are not advocating a commitment to any school of
semantics. Our primary concern will be to show that a programming
style plus the computer-based mechanisms to support the pedagogy can
make a significant contribution to the quest for reliable software.
The mechanics of such an editing system clearly require the use of
display terminals. We envision a command language which will require
rapid display of program segments, rules of construction, partially
completed proofs, execution of program segments, modification of
program segments. Such behavior would not be feasible on other
devices.
FORMALISMS--specification languages and correctness
At the most obvious level, we need to develop a specification
language whose constructs are rich enough to allow natural
description of data-structure algorithms. Several candidates are
available. A choice needs to be made, but the important aspect of the
specification language is that its semantics be conducive to
establishing formal correctness.
The style of semantics advocated by Hoare is one example. This
school of semantics, usually called axiomatic semantics, attempts to
analyze programming constructs in terms of axioms and rules of
inference. The approach is similar to the study of deducibility in
the formal systems of logic. In those studies, we say a proof of a
statement S, from a set of assumptions π, is a sequence of statements
whose final element is S, such that each statement is either an
axiom, an element in π, or the statement follows from preceding
statements by application of a rule of inference. Hoare's rules
axiomatics deal with correctness of program construction. His
notation, P{q}R, is interpreted as meaning: "If statement P is true
at a particular point in a computation, then statement R will be true
after execution of the construct, q. The statements, P and R are
called the input assertion and output assertion, respectively. The
language which we use to express such statements is called the
assertion language. A rule of inference in this formalism is the
following:
P{A}R, R∧S{B}R, R∧¬S⊃Q
______________________
P{A;R;while S do B}Q
This is usually interpreted as saying: if we can deduce the three
components above the line, then we can deduce the statement below the
line. This particular rule is called the "while-rule"; note that R is
a statement in the assertion language. R is called the invariant of
the loop. Typically, the discovery of the loop invariant is a
difficult task.
The usual application of Hoare-style semantics is in the context of
verification of fully specified programs. The programmer writes his
program, interspersing assertions within the body of the program.
This augmented program can then be analyzed mechanically and,
provided sufficient assertions have been included, a set of
statements in the assertion language will be constructed. If we can
establish the validity of these derived statements (called
verification conditions), then we can claim that the program
satisfies its assertions. If the assertions characterize the behavior
intended by the programmer then we can claim that the program meets
its specifications; the program is correct.
Several objections have been raised concerning the practically of
Hoare's techniques:
(1) the problem of discovering sufficient assertions is frequently as
difficult as writing the program;
(2) the difficulties of generating correct programs are better
handled by "constructive" techniques which are applied while the
program is being written, rather than attempting to certify an
existing program.
The first objection stems from at least two sources. The usual
assertion language is a predicate-calculus notation. Frequently
programming constructs do not fit well into this language. This is a
question which we plan to attack. The more fundamental difficulty is
that it will always be more difficult to give a convincing argument
about correctness, than just to appeal to intuition. Intuition is
frequently wrong; that's when we have bugs. Another reason for
difficulty stems from the rather poor state of programming practice.
One normally thinks of assertions as statements about "what" a
program does, and thinks of the program as the "how". The "what"
might be "permutes the elements" or "see if x is a conditional
expression"; the programmer usually gives the "how" as a
representation like "t←x;x←y;y←t" or "(EQ (CAR EXP) (QUOTE COND))".
Expressing the "how" in such representation-dependent terms is an
unnecessary source of difficulty.
The second class of objections stems from a misunderstanding about
Hoare's rules of inference. There is no reason why these rules of
inference cannot be used to guide the construction of the program.
Consider the "iteration rule":
P{A}R, R∧S{B}R, R∧¬S⊃Q
______________________
P{A;R;while S do B}Q
This rule can be used in program construction as follows: It says "If
we wish to elaborate a variable C in a partially constructed program:
partially specified input assertion{ .....;C} partially specified output assertion
then we had better be prepared to justify the three subproblems above
the line." The structure of the three subproblems can be used by the
programmer to continue his construction. If some of the subproblems
are obviously unattainable then the user might withdraw the
application of the iteration rule. The structure of the subproblems
might also help the user specify more of the program or specify part
of the assertion, R. The point is that there is a strong interplay
between the construction of the program and the construction of the
assertions which are required by Hoare.
This interpretation of Hoare's rules is surely "constructive",
following the dictates of structured programming; and if the
programmer uses care in picking representation-independent
programming constructs he can mitigate the complexity of the
construction-verification process. It is this constructive
interpretation of axiomatic semantics which we wish to pursue.
The preceding perspective on axiomatic semantics leads us to believe
that a useful interactive programming system can and should be built.
This involves a command language to select rules, make substitutions,
prove lemmas, do simple bookkeeping, verify subprograms, or construct
program segments.
Clearly the techniques for correct construction are not limited to
Hoare's logic; other specifications of semantics are possible. Also
it is well know that Hoare's assertion language, first order
predicate calculus, is not ideal. But the desirability, perhaps the
necessity, of establishing the correctness as the program is written
is important and can easily be exploited at the informal level
currently, and can be done even formally in some interesting domains.
Research into semantics and proof techniques is necessary.
We have carried out experiments using the current formalism of PASCAL
as described in Igarashi-London-Luckham (V1-V8 pp.28) in guiding the
correct construction of a few algorithms -- unification, for example.
The experiments show that it is indeed reasonable to carry out
construction and verification in parallel. It is clear that a
display-based system and carefully designed user-features are
necessities. The printed trace of the simulated interactions are
quite messy, but the logical structure as displayed on the console is
quite approachable.
It therefore does not seem unreasonable to consider "certification"
restrictions when designing our specification language. If we expect
to submit programs to proof techniques it does not seem at all
unnatural to require that the language constructs match the power of
our formal methods, any more than requiring that the syntax match our
ability to produce efficient parsings or that programming constructs
have efficient run-time representations. The active participation of
semantic methods in the process of language design is much overdue.
If the current semantic techniques are not powerful enough to allow
natural description of a useful language and to formulate reasonable
proofs, then we should know it. To unduly raise user-expectation by
supplying a language which far outstrips our ability to semantically
analyze its programs will lead to immediate and valid criticism.
Just as allowing syntactic contructs which require an undue amount of
processing to discover the parse are discouraged by syntax directed
methods.
The language presented for the description of the algorithm and data
structures must be approachable. It should include facilities for
describing data structures. The description facility should be
general and easy to use. It should also allow the user to specify
input/output conventions for his data structures. As an accompanying
feature, the language should contain strong typing. A significant
number of programming errors are the result of computations with
mismatched types. We have had some very promising experience with a
LISP-like language which supports abstraction.
A proper characterization of an abstract data structures entails a
discussion of abstract control structures. That is, control
structures present in a programming language are concrete; they
operate on the concrete data structures of the language. If we are to
separate data structures from their representation, then we must
likewise be able to separate control from acting on specific
representations. For example, if we wish to use set-valued variables,
then we most likely will want the set-membership relation; when we
wish finally to map sets onto a data structure, we will need a means
of characterizing membership in terms of existing control structures.
Working will need to be done to give adequate axiomatic semantics for
both data structure and control structure definitions. We will need
to study of the questions of program modification and its effect on
corresponding proofs. We will need to examine the question of mapping
programs and proofs from the specification language to an actual
programming language. We will need to investigate to problems of
optimization of data structures and programs. J. Low's recent thesis
on representation of data stuctures is relevant here.
TRANSCRIPTS--An approach to documentation
One of the nagging difficulties in programming is documentation.
People forget what they were doing. Thus a crucial part of a good
programming style must be an adequate documentation facility. The
typical approach in programming is to intersperse comments within the
body of the program. These comments usually express some of the
programmer's intentions about the behavior of the program (informal
assertions). The difficulty with this approach is that the act of
programming is dynamic and much of the impact or direction which the
programmer is pursuing is tied up in the development of the program,
rather than in the final product.
Using our interactive style of abstract programming, proceeding by
layers and elaboration of data structures, we have access to a
fundamentally different style of commentary. Namely the history of
the commands which the user gave to develop his program, carries the
impact of the actual process of programming.
Several proof checkers maintain a file of the steps that the user
gave in an interactive session. Upon completion the final proof can
be extracted from the trace. A similar device should be applicable
in interactive programming. One of the hallmarks of interactive
programming is "selection, not entry" meaning that rather than typing
in the algorithm in a typical line-by-line style, we should give
commands which will select programming constructs. Obviously we must
enter some parts of the algorithm, identifiers for example, but
selection should be the rule rather than the exception. The sequence
of commands given by the user forms a transcript of the development
of the program. This transcript, when played back gives a
documentation of the programming enterprise. The "documentation by
transcripts" of the transactions which such a system can be exploited
with great benefit. The two main ideas of "structured programming"
are (1) abstract syntax (data structures) and (2) the process by
which the program is written. Thus programs, themselves, are really
not structured or ill-structured; it is the "-ing" of "structured
programming" that is important.
The transcript file can be used in many modes. as we have noted, the
working programmer can use it to refresh his train of thought. We
also envision the transcript as the main source of documentation for
programming tasks. We modifications to a program are desired, we
simply play back the transcript to the point where modification is
desired. A new transcript is then created. Thus the documentation of
a program is always consistent with the program.
There are clear applications of this technique to the proper
education of programmers. Many other crafts are best learned by
watching an expert. To gain expertise in chess we might study the
games of the masters. In cooking we frequently read cookbooks, but
can gain more by watching a practicing chef (either in person or on
television). We may learn the mechanics of a skill perhaps by
reading, but we develop a style and flair by watching a practicioner.
We can introduce the novice to much of the style and flair of
programming by supplying transcripts of exemplar programming
sessions.
RELATION TO OTHER WORK
Here we will attempt to relate current research to the approaches
advocated in this proposal. We will examine interactive systems,
automatic programming and verification, and semantics of programming
languages.
Several systems have been built which explore the area of interactive
programming. One of the most recent is described in the thesis of D.
Swinehart. This thesis is also a good survey of related work. The
basic flavor of the work on IP is to give aid to the programmer after
he has constructed his program. Swinehart gives excellent coverage
for interactive debugging and monitoring of running programs. He
recognizes the absolute necessity of sophisticated display techniques
when programming interactively.
We would differ from him in two ways. More superficially, we would
emphasis the construction of algorithm in a specification language
rather than in a particular programming language. More importantly,
our emphasis is on the initial attempt to construct correct
algorithms. To use his terminology, we are interested in developing
"manipulative" techniques in program construction, but his editing
and construction is at the "symbolic" level. We believe that
programmers think "manipulatively" and that exploitation of this
behavior by a system can measurably aid the construction of correct
algorithms. Thus we believe that a interactive programming system
should become useful at the immediate level of formulating the
algorithm, not postponed until the debugging phase. The system should
encourage a systematic style of programming consistent with the
notions of abstraction.
We have already mentioned our interest in the work of Hansen. The
systematic, or structural approach, to program construction is quite
promising. Our approach adds a semantic component to the program
development; as the program is constructed, our proof can be
developed. Our user interface should be less complex since we will be
constructing algorithms in the specification language, rather than in
a complex programming language.
We see future applications of work like J. Low's in the
transformation of data structures to "more efficient"
implementations. He recognizes the benefits of establishing
correctness of programs at the highest possible level of abstraction,
consistent with the structuring properties of the algorithm. Though
he doesn't directly address the questions of correctness, certain
conclusions are apparent. For example he discusses the
implementations of sets as a data structure. He notes that the usual
implementation is in terms of sequences, or lists with special
operations to effect the "set-hood". He notes that it should be
easier to prove the correctness of the algorithms as set operations,
rather than as special relationships on sequences. Indeed if we
think about the problems of verification of the "implemented"
algorithm, we would note that our intuitions about the set-algorithm
which we wish to express as assertions, would also have to be
translated to relationships involving sequences. Thus we would have a
double source of possible errors: our formulation of the algorithm in
terms sequences could be in error, and our expression of the
assertions over sets could be translated inadequately into assertions
over sequences. N. Suzuki has noticed this behavior in his discussion
of verification of sorting. However it appears that he has only
simplified part of the problem; the algorithm is still encoded at the
low-level representation, while the assertions are expressed in the
higher level, with explicit mappings given to relate the high-level
assertions with the low-level code. It seems much more valuable to
address both questions-- construction and verification-- at the
higher level and perform the transformations to implementations
afterwards.
We have performed similar experiments with tree-sort. Tree-sort is a
prime example of a conceptually clear algorithm becoming totally
obscure by including the representation of the data-structures in the
algorithm. When the representation is abstracted out and the
structures are referenced by abstract operations, the behavior of the
algorithm is quite transparent.
Work like Low's would be an integral part of a large system, but it
seems that the primary area of research should be focussed on
developing the tools for adequate construction of correct algorithms
with questions of efficiency becoming of secondary importance.
In two papers, C. B. Jones and C. D. Allen give the formal proofs for
complex algorithms (a form of Earley's parser, and Hoare's FIND) as a
sequence of step-wise developments as the structure of the algorithm
is elaborated. At each level more and more detail is given and
perhaps more structure is added to the data -- sets represented as
sequences-- as the detail enlarges they need prove that the
hypotheses of the previous level hold. These papers show the benefits
derived from a systematic development of proof and program.
There is related work being done by R. A. Snowdon and P. Henderson at
the University of NewCastle Upon Tyne. They have developed systems to
help with the construction and debugging of structured programs.
They have investigated simulation techniques to aid in the
specification and debugging phases.
Several other institutions are actively investigating correctness and
reliability in programming. Indeed, there was a recent international
conference on reliability and a new journal devoted to this topic is
being published. However we believe that there are basic flaws in
current approaches to program reliability. Past experience has shown
that the edit-run-debug cycle is quite insufficient. We believe that
variations of this approach as represented by clever
spelling-correctors or more powerful editors or debuggers will not
reach the basic problems of constructing reliable software. Similarly
we feel that attempts at knowledge-based systems will not lead to
general principles for the discipline of programming, unless of
course the knowledge-base is "programming style" The substance of
such a base is correctness. How do you maintain or recognize
correctness? Informally, we keep our programming segments manageable
and natural so that we can maintain an "informal conviction" about
their validity. Formally, things are not so clear. The area of
formal correctness is finally at a point where a proposal such as
this can have significant impact on programming discipline. But much
research is yet to be done. Research into methodology is necessary
to establish "rules of deduction" but current approaches, in
verification particularly, tend to be much to formal; we don't yet
even have a good handle on maintaining correctness at an informal
level. Difficulties of current verification approaches:
1. assertions are most usually supplied by the writer of the
program. The danger is great that the misconceptions about the
program will spill over to misconceptions about what are sufficient
assertions. A means of alleviating this difficulty is to allow the
highest possible representation of the algorithm. Instead of proving
properties of pointer-manipulations, state the algorithm (being
represented by the program) at a higher level and verify properties
of that algorithm.
2. The related problem of just too low-level a representation of
algorithms and data structures in current programming practice. LISP
programs should never used car-cdr-cons. We should follow the lead of
say numerical analysis, proving properties of algorithms, encoding
those algorithms as programs (with some degree of informal
confidence), and resort to proofs of the implementations only in
those cases where the representation makes a difference. For example
approximation or efficiency considerations may require more careful
analysis due to machine representation; but the confidence in the
basic numerical scheme is established using the mathematical
properties of the functions, not the representations on a specific
machine.
3. The mechanics of getting people to write better programs is
ignored. Verifiability is only one facet of the problem of reliable
software. The technology which will be available in the foreseeable
future will not be sufficiently powerful to verify non-trivial
programs, written by competent programmers who are not trained in
formal methods. We should begin to attack the questions of how to
make better programmers and as a spin-off from this we should see new
techniques for aiding correctness results.
4. Virtually all verification work has been done on programs which
are know to be correct. The problems of generating a correct program
using verification as an aid are virtually unexplored. Used in the
context of construction, it will be using verification conditions as
a consistency check against the current representation of the
program. Here we will be faced with which to believe -- the
verification conditions or the program. Since both assertions and
program are in the process of being generated, we will have to be
prepared to "debug" both simultaneously.
5. There is also a certain level of "idealization" involved. The
semantic properties of several language constructs are such that
conputations either may not terminate or may be undefined. Reduction
rules are frequently given which ignore undefined or non-termination.
Also, as with any correctness proofs, the correctness is "relative"
But that's really all we can ever expect. The difficulties with
verification is whether it is really the best way to proceed.
However, program verification techniques hold much promise.
1. at the most superficial level, the programmer can gain much even
if he only understands the problems of correctness at an informal
level. The awareness of the difficulty of programming in general,
and the realization of the consequences of the constructions he makes
in his program (as he makes them) should lend sobriety to the whole
programming process.
2. It is not at all clear that the formal techniques are
diametrically opposed to good programming design. Most experiments
have been carried out on completed programs, and indeed on programs
written in very specific representations. It appears that (1) program
construction and verification carried out together (2) programming at
the highest level possible, coupled with (3) programming with a
step-wise elaboration all should make the formal verification more
realistic.
As this section shows, our approach will encompass several diverse
areas of research, ranging from the very theoretical, to the
educational, to the quite practical use of displays. We feel that is
will only be through such an integrated attack on the problems of
reliability that real progress will be made.
MILESTONES--Goals and Applications
Since little work has been done in this area, our first task is the
construction of a prototype system to demonstrate the style of program
construction which we are advocating. The specification language need
not be particularly sophisticated, but the experience with the tools
is of utmost necessity. Such a system should contain a simple
specification language, a simple command language for specifying
program constructions, a transcript-commenting facility, and the
correctness tools for the rules of inference as specified by Hoare.
Such a system can be used for demonstration purposes, for
understanding what modifications or additions are called for, and
should certainly be useful in the development of succeeding systems.
We would expect to use this system in programming classes as soon as
possible. The clear candidates for such applications are the course
in Systems Programming, CS140A-B and Symbolic Computation, CS206.
Currently CS206 uses a CAI-LISP to introduce the coding aspects of
LISP. This approach is quite low-level, ignoring everything but the
mechanical aspects of the programming language. The content of
CS140A-B stresses the importance of correctness and reliability; this
course could easily use the proposed system to reinforce the
classroom presentation.
Applications of this programming environment to high school education
are quite exciting. The applicability of a proper study of abstract
data structures and their algorithms extends far beyond their
applications as programming tools. We agree with K. Curtis:
"...it is time to start thinking about
teaching mathematics in the context of
computer science rather than teaching computer
science in the context of mathematics."
Indeed we would go much further. The ideas involved in generalized
algorithms extend far beyond applications to the teaching of
mathematics. We would claim that the study of such algorithms with
an associated programming environment can do much to rejuvenate high
school science curricula. Though the relevancy questions can be
answered by relating such material to programming, we are definitely
not advocating the teaching of programming in high schools. Rather
studying algorithms introduces ways of thinking, just as Geometry has
its primary goal to introduce axiomatics. We do not expect to produce
geometers; we should not expect to produce programmers.
Philosophically we can argue that data structures and algorithms are
fundamental disciplines, and pedagogically, we argue that such
computations are more natural and inviting than what is typically
required in conventional mathematics. The intuitive motivation and
reiforcement of realizablility on machines is important. The lab we
are proposing would be an integral part of our study. Since we feel
strongly that it is the underlying concepts, rather than the surface
properties of the programming language which are proper material for
study, we will develop a suitable text and monitoring system to
supplement the basic lab.
Future extensions of the system are varied. At the theoretical end
of the spectrum, further work is needed on the specification of
semantics of programming languages. We should investigate extensions
to other programming constructs, like those used in specifying
operating systems. Little is known about the effect of program
modification: if we make changes to an algorithm, what can we say
about the correctness proof?
The systems programming and human engineering aspects must be
explored. We can easily imagine background facilities to do
consistency checking: correct calling sequences (data types, arity of
functions) Also, low level responses to questions about the program
or correctness proof should be attainable. We do not expect to build
a sophisticated knowledge-based system or to address the problems of
automatic programming. Such endeavors, we believe are premature.
We see many extensions and applications for the ideas presented here
and are quite excited about their prospects.
Bibliography--(partial)
Swinehart, Daniel C., COPILOT: A Multiprocess Approach to Interactive
Programming Systems. PhD Thesis, Stanford University, AIM-230, July
1974
Hansen, Wilfred J., Creation of Hierarchic Text with a Computer
Display. PhD. Thesis, Stanford University, Argonne National Lab
Report# ANL-7818, July 1971
Low, James R., Automatic Coding: Choice of Data Strutures. PhD.
Thesis, Stanford University, AIM-242, August 1974.
Igarashi,S., London, R.L., & Luckham, D.C., Automatic Program
Verification I: A Logical Basis and its Implementation, Stanford
University, AIM-200, May 1973
Allen, C.D. & Jones, C.B., The Formal Development of an Algorithm,
IBM United Kingdom Laboratories, Hursley Park, UK, IBM Technical
Report TR.12.110, March 1973.
Cheatham, T.E., & Goldberg, J., Proceedings of a Symposium on the
High Cost of Software, Monterey, Cal, AD-777 121, September 1973
Buxton, J.N., & Randell, B., Software Engineering Techniques, Rome
Conference, April 1970.
Snowdon, R.A., Interactive Use of a Computer in the Preparation of
Structured Programs, PhD. Thesis, University of Newcastle Upon Tyne,
June 1974.
Allen, J.R., The Anatomy of a Language (tentative title) text book to
be published by McGraw-Hill.
Naur, P., Programming Languages, Natural Languages, and Mathematics.
SIGPLAN Palo Alto Conference Proceedings, 1975, pp137-148
strachey in s.e.
Dijkstra humble prog, SE, reliable proc.
Liskov, B. & Zilles, S. Specification Techniques for Data Abstraction.
Proceedings of the 1975 Conference on Reliable Software.
Curtis, Kent, K., Computer Science, Federal Programs, and Nirvana,
SIGCSE Bulletin, Vol. 7, No. 1, Feb 1975, pp109-113.
Weinberg, G.M., The Psychology of Computer Programming, Van Nostrand
Reinhold Co., New York (1971).
Mckeeman, W., On Preventing Programming Languages from Interfering with
Programming, IEEE Transactions on Software Engineering. 1975
randell
Software Engineering
1968(Garmish) ed Naur & Randell
1969(Rome) ed Buxton & Randell
Ledgard, H., The case for Structured Programming, BIT Vol. 13,
pp.45-57. 1973
Winograd, T., Breaking the Complexity Barrier again, SIGPLAN-SIGIR
Interface Meeting, SIGPLAN Notices, Vol 10, No.1, Jan 1975, pp.13-30.
Gerhart, S., Correctness-Preserving Program Trnasformations. SIGPLAN
Palo Alto Conference Proceedings, 1975, pp54-66.
Biography:
John Richardson Allen
Born: September 2, 1937; married with two children
Education:
1959 B.A. in mathematics (with honors)
Michigan Technological University
Houghton, Michigan
1962 M.A. in mathematics
University of California
Santa Barbara, Cal
1966-1969 graduate work in Computer Science
Stanford University
Stanford, Cal
Experience:
1959-1960 mathematical programmer
Burroughs Corp.
Sierra Madre, Cal
(machine language programming of various mathematical problems.)
1963-1965 teaching assistant
University of California
Santa Barbara, Cal
(taught trig, algebra, and first-year calculus courses)
1963-1965 mathematical and systems programmer
General Motors Research Labs
Goleta, Cal
(general mathematical programming; developed LISP for IBM7040;
modification of operating system for 7040)
1965-1966 systems programmer
Stanford A.I. Project
Stanford, Cal
(designed and developed time-sharing system for PDP-1; implemented
Culler-Fried system for this system; maintained early PDP-6 monitors)
1966-1968 student research associate
Stanford A.I. Laboratory
Stanford, Cal
(modified MAC-LISP to Stanford-LISP; maintained LISP complier and system;
developed simple LISP editor and interfaced LISP to machine language
programs; designed LISP's big-number programs. Research in resolution
theory and interactive theorem-proving)
1969 systems programmer and researcher
Institute for Mathematical Studies in Social Sciences
Stanford University
Stanford, Cal
(continuing research in theorem-proving; general consultant on PDP-10
system, and applications of theorem-proving to educational areas)
1970-1972 Assistant Professor of Computer Science
Computer Science Dept
University of California
Los Angeles, Cal
(taught courses in basic machine and systems orgainzation, data structures,
compiler construction, and semantics of programming languages. Organized
seminar on extensible languages, correctness and language design. Research
in theorem-proving and language design)
1972 to present Research Associate
Stanford A.I. Lab
Computer Science Dept.
Stanford Cal
(Theory and applications of theorem-proving, automatic programming,
and verification)
1973 & 1974 guest lecturer
Information Sciences Dept
University of Cal
Santa Cruz, Cal
(Each year a two-week session on LISP and data stuctures for the
graduate workshop)
1975 lecturer (current)
Mathematics Dept
San Jose State University
San Jose, Cal
(Teaching a one-semester graduate course on abstract programming,
LISP, data structures, and translator construction;
uses my manuscript(see below))
Publications:
Various very old technical reports from Burroughs and General Motors
McCarthy, et.al. Thor - A Display Based Time Sharing System
AFIPS Vol 30, 1967
Allen & Luckham, An Interactive Theorem-Proving Program
Machine Intelligence Vol 5. 1970
Allen, The Anatomy of a Language (tentative title)
book on language design, abstract programming, data structures, LISP, ...
to be published by McGraw-Hill
Various SAIL memos: Documentation on LISP 1.6, A LISP editor (ALVINE)
and user's manual for the theorem prover.
Organizations:
ACM, SIGACT, SIGART, SIGOPS, SIGPLAN
Interests (in alphabetical order):
Education: I am quite interested in the development better programs
for the education of future Computer Scientists. I particularly
object to the style and content of introductory (undergraduate)
courses. My book is a realization of many of my ideas about
education and some of my research interests involve development of
a programming environment to reenforce these educational policies.
I have written (and had accepted for publication) a letter to the
ACM Forum attacking the current ACM68 course structure.
Interactive systems: I am particularly interested in establishing a
display based programming laboratory to support interative
construction of correct or reliable programs.
Language design: One of the difficulties involved in getting clean
and correct programs written is the generally poor state of curent
programming languages. Most were developed with batch-processing in
mind. Few were developed with any consideration towards on-line
construction, and even fewer considered the problem of proving
correctness of the programs. Just as the early '60s saw a
structuring of language design to conform to the syntactic
constraints of BNF, we should consider the semantic constraints of
correctness when we propose new languages.
Mathematical semantics: One of the current roadblocks to
correctness and reliability is a lack of knowledge about the "rules
of inference" for program construction. Several schools of
semantics of programming language exist. It will be through these
investigations that we will be able to put firm foundations on
programming constructs.
I am currently writing a proposal to begin development of a
programming laboratory which will implement my current ideas on
program construction, correctness, and programming style.